Skip to content

Cache aware for pd#1379

Open
kingder wants to merge 5 commits into
mainfrom
cache_aware
Open

Cache aware for pd#1379
kingder wants to merge 5 commits into
mainfrom
cache_aware

Conversation

@kingder

@kingder kingder commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

support cache aware for prefill selection.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a cache-aware routing policy for the prefill-decode master, utilizing a high-performance radix tree implemented in Rust with PyO3 bindings to route requests to cache-hot nodes. Key feedback highlights critical issues, including an undefined self._lock attribute in CacheAwarePolicy that will cause server crashes, potential deadlocks in the Rust tree implementation when holding read locks during child removal, and a missing update_nodes override in LoadBalancedCacheAwareSelector that prevents worker initialization. Additionally, the reviewer noted a potential ValueError when selecting from empty workers and pointed out that the cumulative processed_prompt_len load metric lacks a decay mechanism, which will degrade load balancing accuracy over time.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread lightllm/server/httpserver_for_pd_master/pd_selector/cache_aware.py Outdated
Comment thread rust/pd_tree/src/lib.rs
Comment on lines +839 to +845
if node.children.is_empty() && node.tenant_last_access_time.is_empty() {
if let Some(ref parent) = parent_opt {
if let Some(fc) = node.text.read().unwrap().first_char() {
parent.children.remove(&fc);
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Holding the read lock on node.text (via node.text.read().unwrap()) while calling parent.children.remove(&fc) creates a deadlock risk. A concurrent insert thread can hold the write lock on parent.children and wait for the write lock on node.text, while this thread holds the read lock on node.text and waits for the write lock on parent.children. To prevent this deadlock, retrieve fc first and release the read lock before removing the child from the parent.

Suggested change
if node.children.is_empty() && node.tenant_last_access_time.is_empty() {
if let Some(ref parent) = parent_opt {
if let Some(fc) = node.text.read().unwrap().first_char() {
parent.children.remove(&fc);
}
}
}
if node.children.is_empty() && node.tenant_last_access_time.is_empty() {
if let Some(ref parent) = parent_opt {
let fc = node.text.read().unwrap().first_char();
if let Some(fc) = fc {
parent.children.remove(&fc);
}
}
}

Comment thread rust/pd_tree/src/lib.rs
Comment on lines +919 to +925
if curr.children.is_empty() && curr.tenant_last_access_time.is_empty() {
if let Some(ref parent) = parent_opt {
if let Some(fc) = curr.text.read().unwrap().first_char() {
parent.children.remove(&fc);
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Holding the read lock on curr.text (via curr.text.read().unwrap()) while calling parent.children.remove(&fc) creates a deadlock risk with concurrent insert operations. To prevent this deadlock, retrieve fc first and release the read lock before removing the child from the parent.

Suggested change
if curr.children.is_empty() && curr.tenant_last_access_time.is_empty() {
if let Some(ref parent) = parent_opt {
if let Some(fc) = curr.text.read().unwrap().first_char() {
parent.children.remove(&fc);
}
}
}
if curr.children.is_empty() && curr.tenant_last_access_time.is_empty() {
if let Some(ref parent) = parent_opt {
let fc = curr.text.read().unwrap().first_char();
if let Some(fc) = fc {
parent.children.remove(&fc);
}
}
}

Comment thread lightllm/server/httpserver_for_pd_master/pd_selector/pd_selector.py Outdated
Comment thread lightllm/server/httpserver_for_pd_master/pd_selector/cache_aware.py
Comment on lines +72 to +73
def update_load(self, prompt_len: int):
self.processed_prompt_len += prompt_len

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The processed_prompt_len load metric is strictly cumulative and is never decremented or decayed when requests finish. Over time, this means a worker that historically processed many requests will always be considered highly loaded, even if it is currently completely idle. Consider implementing a decay mechanism or decrementing the load when requests complete to ensure accurate active load balancing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant